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(54) Voice processing 

(57) An voice synthesizing unit performs voice syn^ 
thesizing processing, based on the state of emotion of 
a robot at an emotion/instinct model unit. For example, 
in the event that the emotion state of the robot repre- 
sents "not angry", synthesized sound of "What is it?" is 



generated at the voice synthesizing unit. On the other 
hand, in the event that the emotion state of the robot 

«. represents "angry", synthesized sound of "Yeah, what? 
" is generated at the voice synthesizing unit, to express 

; the anger. Thus, a robot with a high entertainment na- 
ture is provided. 
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Description 

[0001] The present invention relates to a voice 
processing device, voice processing method, and re- 
cording medium, and particularly (though not exclusive- 5 
ty) relates to a voice processing device, voice process- 
ing method, and recording medium suitably used for a 
robot having voice processing functions such as voice 
recognition voice synthesizing, and so forth. 
[0002] Heretofore, many robots which output synthe- . io 
sized sound when a touch switch is pressed (the defini- 
tion of such robots in the present specification includes 
stuffed animals and the like) have been marketed as toy 
products. 

[0003] However, with conventional robots, the relation is 
between the pressing operation of the touch switch and 
synthesized sound is fixed, so there has been the prob- 
lem that the user gets tir;ed of the robot. 
[0004] Various respective aspects and features of the 
invention are defined in the appended claims. 20 
[0005]. The present invention has been made in light 
of such, and accordingly, it is an object thereof to provide, 
a robot with a high entertainment factor. - 
[0006] A voice processing device according to the' 
present Invention comprises: voice processing means 25 
for processing voice; and control means for controlling 
voice processing by the voice processing means, based, 
on the state of the robot. 

[0007] The control means may corrtrol the voice proc- \ 
ess based on the state of actions, emotions or instincts 30 
of the robot. The voice processing means may comprise 
voice synthesizing means for performing voice synthe-, 
sizing processing and outputting synthesized sound, 
and the control means may control the voice synthesize 
ing processing by the voice synthesizing means, based. 35 
on the state of the robot. 

[0008] The control means may control phonemics in- 
formation and pitch information of synthesized sound 
output by the voice synthesizing means, and the control 
means may also control the speech speed or volume of 40 
synthesized sound output by the voice synthesizing 
means. 1 , 

[0009] The voice processing means may extract the 
pitch information or phonemics information of the input 
voice, and in this case, the emotion; state of the robot 45 
may be changed based on the pitch information or pho- 
nemics information, or the robot may take actions cor-, 
responding to the pitch information or phonemics infor- 
mation. : J t 

[0010] ■ The voice processing means may comprise so 
voice recognizing means for recognizing input- voice, 
and the robot may take actions corresponding to the. re- 
liability of the voice recognition results output; from the 
voice recognizing means, or. the emption.state of the ror 
bot may be changed based on the reliability. — ss 
[0011] The control means may recognize the action 
which the robot is taking, and control voice processing 
by the voice processing means based on the load re- 



garding that action . Also, the robot may take actions cor- 
responding to resources which can be appropriated to 
voice processing By Ithe voice processing means! 
[0012] Theyoice processing method according to the 
present invention comprises: an voice processing step 
for processing voice; and a control step for controlling 
voice processing in the voice processing step, based on 
the state of the robot. 

[0013] The recording medium according to the 
present invention records programs comprising; /an 
voice processing step for processing voice; and a.con- 
trol step for, controlling voice processing in the voice 
processing step, based on the state of the robot. 
[0014] With the. voice processing device, voice 
processing method', and recording medium according to 
the present invention, voice processing is. controlled 
based on the state of the robot. 
[0015] The invention will now be described by way of 
example with reference to the accompanying drawings, 
throughout which Jike parts are referred to by like refer- 
ences, and in which: 

^ .Fig. 1 is a perspective view illustrating an external 
configuration example of an embodiment of a robot 
... to which the present invention has been applied; 
-!< ^Fig. 2 is a block diagram illustrating an internal con- 
juration example of the robot shown in Fig. 1 ; 
: Fig. 3 Js a block diagram illustrating a functional con- 
~ figuration example of the controller 1 0 shown in Fig.. 

v Fig, 4 is a diagram illustrating_an emotion/instinct 
model; ' 
; , Figs, 5A and .SB are diagrams describing the 
processing in the emotion/instinct model unit 51 ; ' 
i Fig. 6 is a diagram illustrating an action model; 

I Fig. 7 is a diagram for, describing the processing of 
.the attitude transition mechanism unit 54; , , 
Fig. 8 is a block diagram illustrating a configuration 
. ; .example of the voice recognizing unit 50A; 
Fig, 9 is a flowchart describing the processing of the 
..voice recognizing unit 50A; 

. . ■ Fig, 1 Q is also a flowchart describing the processing 

: > of tlie voice recognizing unit 50A; 

. , F;ig. 11 is a block diagram illustrating a configuration 
example of the voice synthesizing unit 55; . . 
. Fig. ,12 is a flowchart describing the, processing of 
the voice synthesizing unit 55; 
Fig. .1 3 is also a flowchart describing the processing 
of the voice synthesizing unit 55; t , 
Fig. 1 4 is a block diagram illustrating a configuration 
example of the image recognizing upit SOB;. 
Fig. 15 is a diagram illustrating the relationship be- 
tween the load regarding priority processing, and 
the CPU power which can be appropriated to voice 
recognizing processing; and . , 
Fig. 16. is a flowchart describing the. processing of 

s } the action deterrnining mechanism, unit 52. , , . :i 
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[0016] Fig. 1 illustrates an external configuration ex- 
ample of an embodiment of a robot to which the present 
invention has been applied, and Fig.' 2 illustrates an 
electrical configuration examp le thereof : " H - 
[0017] With the present eribodinn^ robot is a 
dog-type robot, with feg units 3A, 3B; 3C, and 3D linked 
to a torso unit 2, at the front and rear right and left por- 
tions, and with a head unit 4 and tail unit 5 respectively 
linked to the front portion and rear portion of the torso- 
unit 2. . / 1 : ;* 

[0018] , The tail unit ; 5 : is ^'extracted from a base portion'; 
SB provided to the upper plane of the torso unit 2 so as 
to be capable of bending or rpckii^ with a certain 1 degree] 
of freedom. ; . V" . . W 
[0019] Stored in the ~tdso unit 2 are a controller 10; 
which performs i control of trie ehtire robot/ a battery 1*1 
which is the power source for the robot , an internal sen- 1 
sor unit 14 made up of a battery seHsbr 1 2 and mdrmajj 
sensor 13, and so forth. 7 / ? . * V ' 
[0020] Positio Wed ifi tfib h ead uh it 4 aire k microphone^ 
15 which served as' ari b ear°, a CCD (Charge 'Coupled' 
Device) camera 16 which serves as an "eye", a touch' 
sensor 17 which acts as the tactual sense, a speaker 
18 serving as the "mouth", etc./ at the respective posi- 
tions. ' ' , * 

[0021] Further, provided to the joint potions of the leg 
uiiits3A through 3D, the linkage portions of the leg units 
3A through 3D to the torso unit' 2', trie linkage portion of 
the head unit 4 to^trie torso unit 2, the'linkagirportions 
of the tail unit 5 to the torso unit 2, etc.; are T actuators 
3AA 1 through 3AAk, 3BA, through SBA^, 3CA 1 through 
3CAK, 3DA t through SDAk, 4A 1 through 4A£,;5A 1 , and 
SAa, as shown in Fig. 2. ? 
[0022] The microphone 1 5 in the head unit 4 collects 
surrounding voice (sounds) including speech 6f the us- 
er, and sends theobtained voic^ sijgnaistothe controller 
V0. The CCD camera'1 6 takesimages of the 1 surround- 
ing conditions, and sends the obtained image signals to 
the controller 10. 1 ,: - r - ^ 
[0023] The touch sensor 1 7 is jprovided at the- upper 
portion of the head unit 4 for example, so as to detect 
pressure received by physical actions from the* user 
such as "petting" or "hitting", and sends the detection 
results as pressure* detection signals to the controller 1 0. 
[0024] the battery sensor 12 in the torso unit 1 2 de- 
tects the remaining amount of the battery 11 , and sends 
the detection results as remaining battery amount de- 
tection signals to the controller 10. The thermal sensor 
13 detects heat within the robot , and sends the detection 
results as thermal detection signals to the controller 1 0. 
[0025] The controller 1 0 has a CPU (Central Process- 
ing Unit) 1 0A arid memory 1 0B and the like built in; and 
performs various types of processing by executing con- 
trol programs stored in the memory 1 0B at the CPU 
10A. 

[0026] That is, the controller 10* judges surrounding 
conditions; commands from the user; actions performed 
upon the robot by the "user, etc. [ or the absence thereof, 



based on voice signals, image signals, pressure detec- 
tion signals, remaining battery amount detection sig- 
nals, and thermal detection signals, from the micro- 
phone 15, CCD camera 16, touch sensor 17, battery 

5 sensor i 2, arid'thermal sensor 1 3. 

[0027] Further, based on the judgement results and 
the like, the controller 10 decides subsequent actions, 
and drives actuators necessary to this end from the ac- 
tuators 3AA 1 through 3AA k , 3BA-J through 3BA k , 3CA 1 

10 through 30^, 3DA 1 through 3DA k ,:4A 1 through 4Al, 
SAl, and 5A2, based on the decision results, thereby 
causing the robot to perform actions such as moving the 
head unit vertically or horizontally, moving the tail unit 
5, driving the leg units 3A through 3D so as to cause the 

15 robot to take actions such as walking, and so forth. 
[0028] Also, if necessary, the 1 controller generates 
synthesized sound which is supplied to the speaker 18 
arid output, or unshown LEDs (Light-Emitting Diodes) 
provided at the position of the "eyes M of the robot to go 

20 on, off, or bUrik. 

0 ' [0029] Thus, the robot is arranged so as to act in an 
autonomic manner, based oh surrounding conditions 
and the like. * * 
[0030] Next, Fig. 3. illustrates a functional cbnfigura- 

25 tiori example' of the controller shown in Fig. 2. The tunc- . 
tional configuration shown in Fig. 3 is realized by the 
GPU' 1 0A executing the control programs stored in the 
rrierripry 10B. . « . 

[0031] The controller 10 comprises a sensor input 

30 processing unit 50 which recognizes 'specific external 
states/ah emotion/instinct model unit 51 which accumu : 
latesthe recognition results of the r sensor input process-' 
in'g uriit 50 arid expresses the state of emotions and in- 
stincts ran action determining mecfianrsrri unit 52 which 

35 deterrriiries subsequent action based on the recognition' 
" results of the sensor input processing .unit 50 and the 
like, an attitude transition mechanism unit 53 which 
causes the robot to actually take actions based on trie 
^termination resutts of the action determining mecha- 

40 riism unit 52, a contrblmechanism unit 54 which drives 
u and controls the actuators 3AAv through 5A 1 and 5A 2 ; 
arid an voice synthesizing unit 55 which generates syn- 
thesized sound. 

[0032] The sensor input processing unit 50 recogniz- 
45 es certain external states, action performed on the robot 
by the user/ instructions and the like from the user, etc. , 
based on the voice signals, irnage signals, pressure d3- 
tection' signals, etc, provided from the microphone i5, * 
CCD camera 16, touch sensor 17, etc., and notifies the 
so state recognition information representing the recogni- 
- tion results to the emotion/instinct model unit 5 1 and ac- 
tion determining mechanism unit 52. 
[0033] : That is, the sensor input processing unit 50 has 
an voice recognizing unit 50A, : and the recognizing unit 
55 50A performs voice recognition following the control of 
the action determining 1 mechanism unit 52 using the 
voica'signals •provided.from the microphone 15, taking 
into consideration the information obtained from the- 
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emotion/instinct model unit 51 and action determining 
mechanism unit 52 as necessary. Then, the, voice rec- 
ognizing unit 50 A notifies the emotion/instinct model unit 
51 and action determining mechanism unit 52 of instruc- 
tions and the' like 'of the voice recognition results, such 
as "walk", "down'V "chase the ball", for example, asstate 
recognition information. ' ' 

[0034] Also, the sensor input processing unit 50 has 
an image recognizing unit 50B, and the irnage recogniz- 
ing unit SOB perrorms image recognition processing us- 
ing image signals provided from the CCD camera 1 6. In 
the event that as a result of the processing the image 
recognizing unit 50B detects 'a red round object" or "a 
plane vertical to the ground having a certain height or 
more", for example, image recognition results such as 
"there is a ball" or "there is a wall" are notified to the 
emotion/jnstinct model unit 51 and action determining 
mechanism unit 52, as state recognition information. 
[0035] Further, the sensor input processing unit 50 
has an pressure processing unit 50C, and the pressure 
processing uhit.SOC processes pressure detection sig- 
nals provided from the touch sensor \7. ThenV in the 
event that the pressure processing unit 50C detects, ad 
the result of the processing, pressure of a certain threshr 
old value or greater within a short time, the pressure 
processing unit 50C makes recognition of Having been 
"struck (scolded)", while in the event that the pressure 
processing unit 50C detects pressure^less than the 
threshold value dyer a long time, the pressure process 
ing unit i50C makes recognition of having been "petted 
(praised)". The recognition results thereof are notified 
to the' emotion/instinct model unit 51 and action deter- 
mining mechanism unit 52, as state recognition info rrna: 
tion.. ... ( ' 7/ " : " [ 

[0036] The emotion/instinct model unit 51 manages 
both an emotion model and instinct model, representing 
the state of emotions and instincts of the robot, as shown 
in Fig. 4. Here, the emotion model and instinct 1 modej 
are stored in the memory 10B shown in Fig! 3.. ~ , . 
[0037] "The emotion mode! is made up of three 
tion units.60A, 60B, and6QC, for example,' and the enru£ 
tion units 60A through 60C each represent the state (de- 
gree) of "happiness", "sadness", and "anger", with a valr 
ue within the range of 0 to iOO, for example. The values 
are.each changed based on state recognition informa- 
tion from the sensor input processing unit 50,, passage 
of time, and so forth. ... ^ . ' 

[0038] ...Incidentally, an emotjon unit corresponding to 
"fun" can be provided in addition to "happiness", "sad- 
ness", and "anger". 

[0039] The instinct model is made up of three instinct 
units 61A, 61 B, and 61C, for example, and the instinct 
units 61 A through 61 C each represent the . state (de- 
gree) of "hunger", "desire to sleep", and "desire to exer- 
cise", from instinctive desires, with a value within the 
range of O.to 100, for example. The values are, each 
changed based on state recognition, information from 
the sensor input processing unit 50, passage of time, 



and so forth. 

[0040] s Trie emotion/instinct model unit 51 outputs the 
state of emotion represented by the values of the emo- 
tion units 60A through 60C and the state of instinct rep- 

5 resented by the Values'of the instinct units 61 A through 
61 C as emotion/instinct state information, which change 
as described above, to the sensor input processing unit 
50, action determining mechanism unit 52, and voice 
synthesizing unit 55. 

10 [0041] Now, at the emotion/instinct model unit 51 , the 
emotion units 60A through 60C making up the emotion 
model are linked in a mutually suppressing or mutually 
stimulating manner, such that in the eveiitthat the value 
of one of the emotion units changes, the values of the 

is other emotion units change accordingly, thus realizing 
natural erhotibn change. 

[0042] That is, for example, as shown in Fig. 5A, in 
the emotion model the emotion unit 60A representing 

v "happiness" arid the 'emotion unit '60B representing 

26 "sadhess" are linked in a mutually suppressive manner, 
such that in the event that the robot is praised by the 
user, trie value of the emotion uh it 60A for "happiness" 
first increases.^ Further, in this case, the value of the 
emotion unit 60B for" "sadness" decreases in a manner 

25 corresponding with the increase of the value of the emo- 
tion unit 60A for "happtpess", even though state recog- 
nition information for changing the value of the emotion 
unit 60B for "sadness" has not been supplied to the emo- 
tionTjhstihct model unit 51 , Conversely, in the event that 

3d the value of the emotion unit 60B for "sadness" increas; 
es, t(W value of the emotion unit 60A for "happiness" 
decreases accordingly. 

[0043] Further, the emotion unit BOB representing 
"sadness" and the emotion unit 6QC representing "an - 

35 ger" are linked in a mutuaily stimulating manner, such 
that in the event that the robot is struck by the user, the 
value of the emotion unit 60C for "anger" first increases. 
F-urther, in this case, the value of the emotion unit 60B 
^ f or "sacjn ess" increasesjn a man ne r cprrespo ndi n g with 

40 the increase of the value of the emotion unit 60C for "an- 
ger 1 ;,' even though state recognition information for 
changing the'value of the emotion unit 60B for "sadness" 
has not been supplied to the emotion/instinct model unit 
51 . Conversely, in the event that the value of the emotion 

45 unit 60B for "sadness" increases, the value of the emo- 
tion un it. 60C for "anger" increases accordingly. 
[0044] \ \ Further, at the empUbn/instinct model unit 51 j 
the instinct units 61 A through 61 C making up the instinct 

. model are also linked in a mutually suppressing or mu- 

50 tually stimulating manner, as with the above emotion 
model, such that in the eventthat the. value of one of the 
instinct units changes, the values of the other instinct 
units change accordingly, thus realizing natural instinct 
change. '\- ^ 

55 [0045] Also, in addition to state, recognition informa- 
tion being supplied to the emptiph/instinct model unit 51 
from the sensor input processing unit 50, action infor- 
mation indicating current or past actions of the robot, i. 
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e., representing the contents of actions, such as "walked 
for a long time" for example, are supplied f rorn the action 
determining mechanism, unit 52, so .that, event in the 
event that the same state recognition information is pro 1 
vided, different emotion/instinct state information is gen- 
erated according to the actions of the robot indicated by 
the action information. 

[00461. That is to say, as shown in Fig. 5B for example, 
with regard to the emotion model, intensity increasing/ 
decreasing functions 65A through 65C for generating 
value information for increasing or decreasing the* val- 
ues of the emotion. units 60A through 60C based on the 
action information ancfthe state precognition information 
are each provided to the step preceding the emotion 
units 60A.through 60G. The values of "the emotion units 
60A through 60C are each increased or decreased ac- 
cording to the vajues information' output from thejnten- 
sity increasing/decreasing functions 65A through 65c. ~ 
[0047] As a result, in the. event that the robot greets 
the, user and the user pets the robot. on the. head, for 
example, the action information pf greeting the user and 
the state recognition information of having been pet on 
the head are. provided to. the intensity increaisipQ/de^ 
creasing function 65A, and in this. case, the value of the 
emotion unit 60AJor "happiness 0 is increased at the 
emotion/instinct model unit 51 1. " ' t .. 

[6048] On the other hand, in the event that the robot 
is petted oh the head while" executing a task of some 
sort, action information that a task is being executed and 
the state recognition information of having been get on 
the head are provided to the intensity increasing/de; 
creasing function 65A, but in this case, the value of the 
emotion unit 60A for "happiness" is hot changed at the 
emotion/instinct model unit 51. . 
[0049] ' Thus, the emotiph/instihct model, unit 51 does 
not only make reference ''to the state recognition infor- 
mation, but also makes reference to action infojTnatiqn 
indicating the past or present actions of the robot, arid 
thus sets the values of the emotion units ]60A through 
6bC. Consequently, in the event that the user mischief 
yously pets the robot on the head while the robot' is ex- 
ecuting a task of some sort; a unnatural changes in errifc 
tions due to the value of the emotion unit 60 A for "hap^ 
piness M being increased can be avoided. 
[0050] Further, regarding the instinct ' units; 61 A 
through 61 C making up the 1 instinct model, the emotion/ 
instinct model unit 51 increases or decreases the values 
of each based on both state recognition information and 
action information in the same manner as with the case 
of the emotion model. 

[0051 ] Now, the intensity increasing/decreasing func- 
tions 65A through 65C are functions which generate and 
output value information for changing the values of the 
emotions units 60A through 61 C according to preset pa- 
rameters, with the state recognition information and ac- 
tion information as input jthereof , and setting these pa- 
rameters to values differently for each robot would allow 
for individual characteristics for each robot, such as one 



robot being of a testy nature and another being jolly, for 
example. \ . . 

[0052] Returning to Fig/ 3, the. action determining 
mechanism unit 52 decides the next action based on 

5 state recognition information from the , sensor input 
processing unit 50 and emotion/instinct information from 
the emotion/instinct model unit 51 , passage of time, etc., 
and the decided action contents are output to the atti- 
tude transition mechanism unit 53 as action instruction 

io information.. 

[0053] That is, as shown in Fig. 6, the action deter- 
mining mechanism unit 52 manages finite automatons 
wherein the actions of which the robot is capable of tak- 
ing are corresponding to the state, as action models stip-. 

15 ulating the actions of the robot. The state in the finite 
automaton serving as the action model is caused to 
make transition based on state recognition information 
from the' sensor Input processing unit 50, the values of 
the emotion model and instinct model at the emotion/ 

20 instinct mbdel unit 51 . passage of time, etc., and actions 
corresponding t6 the state following the transition are 
determined to be the actions to taken next. 
[0054] Specifically, for example, in Fig. 6>, let us say 
that state ST3 represents an action of "standing" t state 

25 ST4 represents ah action of "lying on side", and state 
ST5 represents an action of "chasing a bail". Now, in the; 
state ST5 for "chasing a ball" for example, in the event 
that state recognition information of "visual contact with 
ball has been losr is supplied, the state makes a tran- 

30 sitibn from state ST5 to state ST3, and. consequently, 
the action of "standing" which corresponds to state ST3 
is decided upon as the subsequent action. Also, in the 
event that the robot is in state ST4 for "lying on side" for 
exarriple, and state recognition information of "Get up!" 

35 is supplied, the state makes a transition from state ST4 
to state ST3^ and consequently, the action of "standing" 
which corresponds to state ST3 is decided upon as the 
subsequent action. / 

[0055] Now, in the event that the action determining 
40 mechanism unit 52 detects a predetermined trigger, 
state transition is executed. That is to say, in the event 
that the time for the action corresponding to the current 
state has reached a predetermined time, in the event 
that certain state recognition information has been re- 
45 ceived, in the event that the value of the state of emotion 
(Le., values of emotion units 60A through 60C) or the 
value of the state of instinct (i.e., values of instinct units ' 
61 A through 61 C) represented by the emotion/instinct 
state information supplied from the emotion/instinct 1 
so model unit 51 are equal to or less than, or are equal to 
or greater than a predetermined threshold value, etc., 
the action determining mechanism unit 52 causes state 
transition. 

[0056] Note that the action determining mechanism 
55 unit 52 causes state transition of the finite automaton in 
Fig. 6'based not only state recognition information from 
the sensor input processing unit 50, but also based on 
values of the erinotion'model and instinct model from the 
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emotion/instinct model unit 51 , etc., so that event in the 
event that the same state recognition information is in- 
put;, the destination. of transition of the state differs ac- 
cording to the emotion rmvJel and instinct model (i.e., 
emotion/instinct information). 

[0057] Consequently, in the event that the emotion/ 
instinct state information indicates that the state is "not 
angry* and "not hungry", for example, and in the event 
that the state recognition information indicates "the palm 
of a hand being held out in front", the action determining 
mechanism unit 52 generates action instruction infor- 
mation for causing an action of "shaking hands" in ac- 
cordance with the hand being held out in front, and this 
is sent to the ,attitude transition mechanism unit 53. 
[0058] Also, in the event that the emotion/instinct 
state information indicates that the state is "not angry" 
and "hungry", for example, and in the evehtthat the state 
recognition information indicates "the palm of a hand be- 
ing held out in front* 1 , the action determining mechanism 
unit 52 generates action instruction information for caus- 
ing an action of "licking the hand" in accordance with the 
hand being held out in front, and this is sent to the attk 
tude transition mechanism unit 53. 
[0059] Further, in the event that the emotion/instinct 
state information indicates that the state is "angry" for 
example, and in the event that the state recognition in- 
formation indicates "the palm of a hand being held out 
in front", the action determining mechanism unit 52 gen- 
erates action instruction information for causing an ac- 
tion of "looking the other way", regardless of whether, 
the emotion/instinct information indicates "hungry" or, 
"not hungry", and this is sent to the attitude transition 
mechanism unit 53. 

[0060] Incidentally, the action determining mecha- 
nism unit 52 is capable of determining the speed of walk- 
ing, the magnitude of movement of the legs and, the 
speed thereof, etc., serving as parameters of action cor- 
responding to the state to which transition has been 
made, based on the state of emotions and instincts in- 
dicated by the emotion/instinct state information sup- 
plied from the emotion/instinct model unit 51 . 
[0061] Also, in. addition to action instruction informa- 
tion for causing movement of the robot headj legs, etc.* 
the action determining mechanism unit 52 generates ac- 
tion instruction information for causing speech by the ro- 
bot, and action instruction information for causing th§ 
robot to execute speech recognition, The action instruc 1 
tion information for causing speech by therobot is sur> 
plied to the voice synthesizing unit 55, and the action 
instruction information supplied to the voice synthesiz- 
ing unit 55 contains text and the like corresponding to 
the synthesized sound to be generated by the voice syn^ 
thesizing unit 55. Once the voice synthesizing unit-55 
receives the action instruction information from the ac- 
tion determining mechanism unit 52, synthesized.sound 
is generated based; on the text contained in the'actipn 
instruction information while. adding Jn thestate of emo 7 
tions and the state of instructs managed by the emotion/ 



instinct moael unit 51 , arid the synthesized sound ia sup- 
plied to and output from the speaker 18. Also, the action 
instruction info' rjnatioh for causing the robot to execute 
speech recognition is supplied to the 1 voice recognizing 
5 unit 50 A of the* sensor input processing unit 50, and up- 
on receiving such action 1 irist ruction 1 information, the 
voice recognizing unit 50A performs voice recognizing 
processing. 

[0062] Further, the action determining mechanism 

10 unit 52 is arranged so as to supply the same action in 1 
formation supplied to the emotion/instinct model unit 51 , 
td i the sensor input process jng 6nrt 50 an d the voice syn : 
thesizing unit 55^ The voice recognizing unit 50A of the 
sensor input processing unit 50 and the voice synthe- 

is sizing unit 55 each perform voice recognizing and voice 
synthesizing, adding in the action information from the 
action determining mechanist^ unit 52. This point will be 
described later, f , ^ 

, [0063] The attitu de trans iiib'n rriecti an ism u n it 53 gen - 

2d erates attitu de transition i hf brm ation f or causi h g trans i- 
tion of the attitude of the robot from the current attitude 
to the next attitude, based on the action instruction in- 
formation from the ^ actiori determining mechanism unit 

. , £>2, t and outputs this to the control mechanism unit 54. 

25 [0064] Now, a next attitude to which transition can be 
made from the" current attitude is determined by, e.g., 
the physical form of the robot such as the form, weight, 
and linkage state of the torso and legs, for example, and 

. t . the mechanism of the actuators 3 A ^ through 5A 1 and 

30 5A^ such as the direction and angle in which the joints 
will bend, arid so forth. ' ' . 

[0065] t ' Also, regarding the next attitude, there are at- 
titudes to which transition can be made directly from the 
current attitude, and attitudes to which transition cannot 

35 be di rectly made from the cu rrent attitude . F6 r example, 
a quadruped robot in a state lying on its side with its legs 
straight out can directly make transition to a state of lying 
prpstrate, but cannot directjy make transition to a state 

; . of . standing, so there" is the need to first draw the legs 

40 near to the body arid change to a state of lying prostrate, 
following which the robot stands up, i.e., actions in two 
stages are necessary. Also, there are attitudes to which 
transition canriof be made safely. For example, in the 
event that a quadruped robot in an attitude of standing 

45 6h four legs attempts to'raise both front legs/ the robot 
will readily fall qyer. . 

[0066] Accordingly, the attitude transition mechanism 
unit 53 registers beforehand attitudes to which direct 
transition can be made, and in the event that the action 

so instruction jnifonriation supplied from the. action deter- 
mining mechanism unit 52 indicates an attitude to which 
direct transition. can be made, the action instruction in- 
formation is output without change as attitude transition 
information to the control mechanism unit 54. On the 

55 other hand, in the event that the action, instruction infor- 
mation indicates an attitude to which direct transition 
cannot be made, the attitude transitipri fnec unit 
53 first makes transition to another attitude to which di- 
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rect transition can be made, following, which attitude 
transition information is generated for. causing transition 
to the object attitude^ and this infonpatipn.js s^nt to the 
to the control mechanism unit J&£. ^us. jncidents 
wherein the robot attempts to assume t ^tg|des to which 
transition is impossible, arid incidents wherein the robot 
falls oyer, can be prevented. 

[0067] That is to say, as shown in Fig. 7 for exarnple, 
the attitude transition mechanism unit 53 stores ah ori-j 
ented graph wherein the attitudes which the robot can 
assume are represented as nodes NODE 1. through 
NODE S, and nodes. corresponding to two attitudes b^ 
tween which transition 'can' be made are linked by .on-, 
ented arcs ARC 1 through JKRC 1.q,.thereby generating 
attitude transition information such as Described above,' 
based on this oriented. graph., ; , . _ <( 1 
[0068]. Specifically,.'in t>ie event that action instfuction 
information is supplied from the action .deterrjni in ing 
mechanism unit 52, the attitude transition mechanism 
unit 53 searches a path f rom.the current node to the next 
node by, following the direction of the ^orient ecl'i arc con- 
necting the node corresponding to the current attitude 
arid the node corresponding to the next attitude to' be[ 
assumed which the action instruction information jhdi- 
cates, thereby generating attitude transition information 
wherein attitudes corresponding to the rioctes on' thef 
searched.path are assumed.^ . , *\ '' r ','i '\ ' \ . 

[0069] Consequently in the event that the i currerVt ki- 
titude is the node NODE 2 which indicates the attitude 
of "lying prostrate", for example, and action instruction 
information of "sit" is supplied, the attitude transition 
mechanism unit 53 generates attitude transition Infor- 
mation corresponding to^sif", since direct transition can 
be made from .'the NODE 2 which indicates the attitud^ 
of "tying prostrate* to the node NODE 5 which indicates 
the attitude of\"sitting" in the oriented graph; and thiW 
information is provided to the control mechanism 'unit 

54.:/. / . " \v7 s ' : ^..v^ ' 

[0070] Also, in the event iHat the current attitude is the 
node NODE 2 when indicates the attitude of "lying pros; 
trate", and action instruction information of "walk" is sup- 
plied, the attitude transition mechahism unit 53 search- 
es, a path from the NODE 2 which indicates the attitude 
of "lying prostrate" to the node NODE 4 which indicates 
the attitude of "walking - , in the oriented graph. In this 
case, the path obtained is NODE 2 which indicates the 
attitude of "lying prostrale". NODE 3 which indicates the 
attitude of "stahdingV arid NODE 4 which indicates the 
attitude of "walking", so the attitude transition mecha- 
nism unit 53 generates attitude transition information in 
the order of "standing", and "walking"; which is sent to 
the control mechahism unit 54, 

[0071] The control mechahism unit 54 generates con- 
trol signals' for driving the actuators 3 AA-, through 5A 1 
and 5A2 according to the attitude transition information 
from the attitude transition mechanism unit 53, and 
sends this information to the actuators 3AA n through 
5A t and5^. Thus; the actuators 3AA 1 through 5A-,' and 



5A2 are driven according to the control signals, and the 
robot acts in an autonomic manner. 
[0072] Next, Fig. 8 illustrates a configuration example 
of the voice recognizing unit 50A shown in Fig. 3. ' 

5 [0073] Audio signals from the microphone 15 are sup- 
plied to an A/D (Analog/Digital) converting unit 21. At 
the A/D converting unit 21 , the analog voice signals from 
the microphone 15 are sampled and quantized, and 
subjected to A/D conversion into digital voice signal da : 

10 \a. .This voice data is supplied to a characteristics ex- 
tracting unit 22. 

[0074] The characteristics extracting unit 22 performs 
MFCC (Mel Frequency Cepstrum Coefficient) analysis 
for example for each appropriate frame of the input voice 

15 data, arid outputs the analysis results to the matching 
unit 23 as characteristics parameters (characteristics 
vectors). Incidentally, at the characteristics extracting 
unit 22; characteristics extracting can be performed oth- 
erwise, such as extracting linear prediction coefficients , 

20 cepstrum coefficients; line spectrum sets, power for pre- 
determined frequency bands (filterbank output), etc., as 
characteristics parameters. 

[0075] J Also, the characteristics extracting unit 22 ex- 
tracts pitch information from the voice data input thereto. 
25 That is, the characteristics extracting unit 22 performs 
autocorrelation analysis for example of trie voice data 
for example, thereby extracting pitch information of in- 
formation and the like relating to the pitch frequency, 
power (amplitude), intonatiori, etc., of the voice input to 
30 the .microphone 15. 

[0076] The matching unit 23 performs voice recogni- 
tion of the voice input to the microphone 15 (i.e., the 
input voice) using the characteristics parameters from 
the characteristics extracting unit 22 based on continu- 
es ous distribution HN/l M (Hidden Markov Model) for exam- 
ple, while making reference to the acoustics model stor- 
ing unit i24 : dictionary storing unit 25, and grammar stor- 
ing unit 26, as necessaVy 

[0077] That is to say, the acoustics model storing unit 
40 24 stores acoustics models representing acoustical 
characteristics such as individual phonemes and syila^ 
bles in the language of the voice which is to be subjected 
to voice recognition. Here, voice recognition is per- 
formed based on the continuous distribution HMM metri- 
cs od"; so the HMM (Hidden Markov Model) is used as the 
acoustics model. The dictionary storing unit 25 stores 
Word dictionaries describing'information relating to the' 
pronunciation (i.e., phonemics information) for each 
word to be recognized. The grammar storing unit 26 
so stdres syntaxes describing the manner in which each 
word registered in the word dictionary of the dictionary 
storing unit 25 concatenate (connect). The syntax used 
here may be rules based on context-free grammar 
(CFG), stochastic word concatenation probability (N- 
55 gram), arid'sbfortnV ' 

[0078] The mafchirig unit 23 connects the acoustic 
models stored fn the acoustics model storing unit 24 by 
making referencelo the word dictionaries stored in the 
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dictionary storing unit 25, thereby configuring word 
acoustic models (word models). Further, the matching 
unit 23 connects multiple word models by making refer- 
ence to the syntaxes stored in the grammar storing unit 
26, and recognizes the speech input from the micro- 
phone 1 5 using the word models thus connected, based 
on the characteristics parameters, by continuous distri- 
bution HMM. 

That is to say, the matching unit 23 detects a wore! model 
sequence with the highest score (likelihood) of obser- 
vation of the time-sequence characteristics parameters 
output by the characteristics extracting unit 22, and the 
phonemics information (reading) of the word string cor- 
relating to the word model sequence is output as the 
voice recognition results. 

[0079] That is to say, the matching unit 23 accumu- 
lates the emergence probability of each of the charac- 
teristics parameters regarding word strings correspond- 
ing to the connected word models, and with the accu- 
mulated value as (he score thereof, outputs the phone 1 
mics information of the word string with the highest 
score from the voice recognition results. 
[0080] Further, the marching unit 23 outputs the score 
of the voice recognizing results as reliability information 
representing the reliability of the voice recognizing re- 
sults. ' ' 
[0081 ] Also, the matching unit 23 detects the duration 
of each phoneme and word making up the voice recog- 
nizing results which is obtained along with score calcu- 
lation such as described above, and outputs this as pho- 
nemics information of the voice input to the microphone 

15. ' * ' 

[0082] The recognition results of the voice input to the 
microphone 15, the phonemics information, and reliabil- 
ity information, output as described above, are output to 
the emotion/instinct model unit 51 and action detenriih- 
ing mechanism unit 52, as state recognition information/ 
[0083] The voice recognizing unit 50A configured as 
described above is .subjected to control of voice recog- 
nition, processing based on the state of emotions and 
instincts of the robot, managed by the emotion/instinct 
model unit 51 . That is, the state of emotions and instincts 
of the robot managed by the emotion/instinct model unit 
51 are supplied to the characteristics extracting unit 22 
and the matching unit 23, and the characteristics" ex : 
tracting unit 22 and the matching unit 23 change the 
processing contents based on the state of emotions and 
instincts of the robot supplied thereto. 
[0084] Specifically, as shown in the flowchart in Fig. 
9, . once action instruction information instructing voice 
recognition processing is transmitted from the action de- 
termining mechanism unit 52, the action instruction iiy 
formation is received in step S1 ,,and the blocks making 
up the voice recognizing unit 50A are seUo an active 
state. Thus, the voice recognizing unit 50A is set. in a 
state capable of accepting the voice that has been input 
to the microphone 15. j ■ ' 

[0085] Incidentally, the blocks making up the voice 



recognizing unit 5bA may be set to an active state at all 
times. In this case, an arrangement may be made for 
example wherein the processing from step S2 on in Fig. 
9 is starteel at the voice recognizing unit 50A each tirne 
5 the state of emotions and instincts of the* robot managed 
by the emotion/instinct model unit 51 changes'. 
[0086] Subsequently, the characteristics extracting 
unit 22 and the matching unit 23 recognize the state of 
emotions and instincts of the robot by making reference 
10 to the emotion/instinct model unit 51 in step S2, and the 
flow proceeds to step S3. In step S3, the matching unit 
23 sets word dictionaries to be used for the above-de- 
scribed score calculating (matching), based on the state 
of emotions and instincts. 
15 [0087] That is to say, here, thedictionary storing unit 
25 divides the' words which are to be the object of rec- 
ognition into several categories, and stores multiple 
word' dictionaries with words registered for each cate- 
gory: In step S3, word dictionaries to be used for voice 

20 recognising are set based on the state of emotions and 
instincts of the robbt. - : ' r - 

[0088] Specifically, in the event that there is a'word 
dictionary with the word "shake hands" registered in the 
dictionary storing unit25 andalsoa word dictionary with- 
es out the word "shake hands" registered therein, and in 
the event that the state of emotion of the robot repre- 
sents "pleasant", trie word dictionary with the word 
"shake hands" registered therein is used for voice rec- 
ognizing. However, in the event that the state of emotion 

30 of the robot represents "cross", the word dictionary with 
the word "shakehands" not registered therein is used 
for voice recognizing. Accordingly, in the event that the 
state of emotion oif the robot is pleasant, the speech 
"shake hands" is recognizes, arid the voicW recognizing 

35 results' thereof are supplied to the action determining 
mechanism unit 52, thereby causing the robot to take 
action corresponding to the speech "shake hands" as 
described above On the other hand, in the event that 
the results show that the robot' is cross, the speech 

40 "shake, hands" is riot Vecog n ized (or erroneous ly recog- 
nize^, so the robot 1 makes to response thereto (or takes 
actions unrelated to the speech "shake hands"). 
[0089] incidentally, the arrangement here 1 is such that 
multipfe word dictionaries are prepared, and the word 

45 dictionaries to be used for voice recognizing are select- 
ed based on the state of emotions and instincts 6f the 
robot, but other arrangerrierits may be made, sucn as 
an arrangement for example wherein just one word dic- 
tionary js provided and words to serve as the object of 

so voice recognizing are selected from the word dictionary, 
based on the state of emotions and instincts of the robot. 
[0090] Following the processing of step S3, the flow 
proceeds to step S4, and the characteristics extracting 
unit 22 and the matching unit 23 set the parameters to 

55 be used for voice recognizing processing (i.e., recogni- 
tion parameters)', based oh the state of emotions and 
instincts of the robot. . v . , . - 

[0091 ] That is, for example, in the event that the emo- 
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tion state of the robot indicates "angry" or the instinct 
state of the robot indicates "sleepy\the characteristics 
extracting unit 22 and the matching nit .23 set the rec- - 
ognition parameters such that the ypjce.recognition pre- 
cision deteriorates. On the other hand, in the event that 
the emotion state of the robot indicates "pleasant", the 
characteristics extracting unit 22 and the matching unit 
23 set the. recognition parameters such that the voice 
recognition precision improves,. 

[0092] Now,, recognition parameters which affect the 
voice recognition precision include., for example, thresh- 
old values compared with the voice input to the micro- 
phone 15, used in .detection of, voice sections, and so 
forth. "\T ..',* .* * . 

[0093] Subsequently, the flow proceeds to step S5, 
wherein the voice. input.to the microphpne.lS is taken 
into the charact^nsticsje^racting! unit^^vja th^'Aip 
converting unit21 ( , and tne i flow, proceeds tip step. S6. c ^t 
step S6, the aboyerdescribed processing is . perfprmea* 
at the characteristics extracting unit 22 and the matching 
unit 23 under the settings made in 'step f S3.and S4, jfaprej 
by executing voice recognizing qf the voice input to ,t.h£ 
rnicrpphone lS. T^en, the flow 

the phonemics information, pitch information,. and r©Ua: 
bility information, which are the voice, re'cognition re^ 
obtained by the. processing in step^S6, are output to the 
emotion/instinct model unit 51 and action determining 
mechanism unit 52 as state recognition information", af\d 
the processing ends. h , . . 

[0094] Upon receiving such state recognition Intona- 
tion from the voiced recognizing unit 50 A, the emotion/ 
instinct mode) unit51 changes the values; .of the emotion 
modeUhd instinct model as described with^Hg. S^basea" 
on the' state .recognition information, thereby'changing 
the state jpf emotions .and/the state of instincts of jthe 
robot.. \ " ' , " " ...... • 

[0095] . Jhat.is, for example,, in the event that the pho- 
nemics information serving as the* voice recognition re- 
sults in the state recognition information is, "Fpbl!" f tfje 
emotion/instinct model unit51 increases the value of the 
emotion unit 60C for ''anger*; Also," the emotior^nstjnct 
mode! unit 51 changes the values information output by 
the increasing/decreasing functions 65A through ,&5C, 
based on pitch^frequehcy serving as the phonemics in- 
forrnation in the state recognition^ information, and the 
power and duration thereof, thereby changing the^ val- 
ues of "the emotion model and instinct model. 
[0096] Also, in the event that the reliability information 
in the state recognition information indicates that the re- 
liability of the voice recognition results is. low. the errio 1 
tion/ipstinct 'model unit 51. increases" the value of the 
emotion unit 60 B for "sadness", for example. On the pth- 
er hand, in the event that the reliability information in the 
state recognition information indicates that the reliability 
of the voice recognition results is high, the emotion/in- 
stinct model unit 51 increases the value of the emotion 
unit 60 A for "happiness", for example. ' * 
[0097] Upon, receiving the state recognition inforrnar 



tion from the voice recognizing unit BOA, the action de- 
termining mechanism unit 52 determines the next action 
of the robot based on the state recognition information, 
and generates action instruction information for repre- 

5 senting that action. 

[0098] That is to say, the action determining mecha- 
nism unit 52 determines an action to take corresponding 
to the phonemics information of the voice recognizing 
results in the state recognizing information as described 

10 above, for example (e.g., determines to shake hands in 
the event that the voice recognizing results are "shake 
hands"). 

[0099] ' Or, in the event that the reliability information 
in the state recognizing information indicates that the 

is reliability of the voice recognizing results is low, the ac- 
tion determining mechanism unit 52 determines to take 
an action such as cocking the head or acting apologet- 
ically, for example, on the other hand, in the event that 
the, reliability information in the state recognizing infor- 

2p matjon indicates that the reliability of the voice recog-, 
nizing results ishigh, the action determining mechanism 
unit 52 1 determines to take an action such as nodding 
the head, for example. In this case, the robot can. indi- 
cate to the user the degree of understanding of the 

25 speech of the user. 

[0100] Next, action information indicating the con- 
tents of current or past actions of the robot are supplied 
from the, action determining mechanism unit 52 to the 
voice recognizing unit 50A, as described above, and the 

30 voice recognizing unit 50A can be arranged to perform 
control of the voice recognizing processing based on the 
action informatibh. That is; the action information output 
from the action determining mechanism unit 52 is sup- } 
piied to the characteristics extracting unit 22 and the" 

35 matching unii'23, and the characteristics extracting unit 
22 arid the matching unit 23 can be arranged tocharige 
the processing contents based on the action information 
supplied thereto. 

[0101] Specifically, as shown in the flowchart in Fig. 

40 io, upon action instruction information instructing the 
voice recbgnizirig processing bein g transmitted from the 
action determining mechanism unit 52, the action in- 
struction information is received at the voice recognizing 
unit 50A in step S11 in the same manner as that of step 

45 si in Fig. 9, and the blocks making up the voice recog- 
nizing unit 50 A are set to an active state. ' 
[0102] incidentally, as described above, the blocks 
making up the voice recognizing unit 50A may be jset to* 
an active state at all times. In this case, an aiTangement 

50 may be made for example wherein the processing from* 
step S1 2 on in Fig. 10 is started at the voice recognizing 
unit 50A each time the action information output from 
the action determining mechanism unit 52 changes. 
[0103] "Subsequently, the characteristics extracting 

55 unit 22* and the matching unit 23 make reference to the 
action information' output from the action determining' 
mechanism unit 52 in step S12, and the flow proceeds 
to step S1 3. In step S13, the matching unit 23 sets word^ 
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dictionaries to be used for the above-described score 
calculating (matching), based on the action information. 
[0104] That is. for example, in the event thatthe action 
information represents the current action to be "sitting" 
or "lying on side", it is basically inconceivable that the s 
user would say, "Sit!" "to" the robot. Accordingly, the. 
matching unit 23 sets the word dictionaries of the dic : 
tionary storing unit 25 so that the word "Sit!" is excluded 
from the object of speech recognition, in the event that 
the action information represents the current action to 10 
be "sitting" or "lying on side". In this case, no speech 
recognition is made regarding the speech "Sit!". Further, 
in this case, the number of words which are the object 
of speech recognition decrease, thereby enabling in- 
creased processing speeds and improved recognition is 
precision. .. . .. 

[0105] Following the processing of step S13, the flow 
proceeds to step. S 14, and the characteristics extracting^ 
unit 22 and the matching unit 23set the parameters to^ 
be used for voice recognition processing (i.e., recogni- 20 
tion parameters) based on the action information. 
[0106] That is. in the event that the action information', 
represents "walking", for example, the characteristics^ 
extracting unit 22 and the matchingunit ; 23 sets trie rep- ; 
ognition parameters such that priority is given topreci-- 25 
sion over processing speed, as compared to cases 
wherein the action information represents "sitting" or "ly- 
ing prostrate*, for example. 

[01 07] On the other hand, in the event that the action 
information represents "sitting" or "lying prostrate", for 30 
example, the recognition parameters are set such that 
priority is given to processing speed over precision, as : 
compared to cases wherein the action information rep- 
resents "walking", for example. / 
[0108] In the event that the robot is walking, the noise 35 
level from the driving of the actuators 3AA 1 through 5A r 
and 5A 2 is higher than in the case of sitting or tying pros- . 
trate, and generally, the precision of voice recognition, 
deteriorates duo lo the effects of the, noise. Thus, setting 
the recognition parameters such that priority is given to 40 
precision over processing speed in the event that the 
robot is walking.allows deterioration or voice recognition: 
precision, due to the noise, to be prevented (reduced).. 
[0109] On the other hand, in the event. that the robot 
is sitting or lying prostrate, there, is no noise from the *s 
above.actuators 3AA v ihrough 5A., andSAg, so there is ; 
no deterioration of voice recognition precision due to the 
driving-noise. Accordingly setting the recognition pa-, 
rameters such that priority is given to processing speed 
over precision in the event thatthe robot is sitting or lying so 
prostrate allows the processing speed of voice recogni- 
tion, to bo improved, while maintaining a certain level of 
voice recognition precision. ; 
[0110] Now, as for recognition parameters which af : 
feet the precision and processing speed of voice recog- ss 
nition, there is for example the hypothetical range in the 
event of restricting the range serving, as the object of. 
score calculation by the Beam Search method at the 
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matching "unit 23. (i.e., the beam width for the beam 
search), and so forth. ' 
[0111], Subsequently, the flow proceeds to step Si5, 
the voice input to the microphone 15 is taken into the 
characteristics extracting unit 22 via the A^D converting 
unit 21 , and the flow proceeds to step S1 6. At step S1 6, 
the above-described processing is performed at' the 
characteristics extracting unit 22 and the matching unjt 
23 under the settings made in step S13 arid SI 4, there- 
by executing voice recognizing of the voice .input to the 
microphone 15\ Then, the flow proceeds to step S17, 
and the phonemics information, pitch information, and 
reliability information, which are "the voice recognition re- 
sults obtained by the processing in step S1 6, are output 
to the emotion/instinct model unit 51 and action deter- 
mining mechanism unit 52 as state recognition informa- 
tion, and, the processing lends. . . 
[Q1 12] Upon.receMng such state recognition informa- 
tion, frqm the vpice recognizing unit 50 A, the emotion/ 
instinct model; unit 51 and, action determining mecha- 
nism unit 52 change the values of the emotion model 
ana; instinct model as described above based on the 
state ; recognition information,. and determining the next 
action ; of the, robot. 

[0113] ; Also,, though the above arrangement involves 
setting the recognition parameters such that priority is 
given to precision over processing speed in the event 
that the robot is walking, since the effects of noise from 
the driving of the actuators I3AA 1 through 5A 1 and 5A 2 
pause the precision of voice recognition to deteriorate, 
thereby* preventing .deterioration of voice recognition 
precision due, to the noise, but an arrangement may be 
made wherein in the event that the robot is walking, the 
robot is caused to temporarily stop to perform voice rec- 
ognition, an prevention deterioration, of voice recogni- 
tion precision can be.realized with such a arrangement, 
as-well.. .;, , t . .'^ 

[0114] Next, Fig. 11 illustrates a configuration exam- 
ple of the voice synthesizing unit 55 shown in Fig. 3. 
[01.15],- The action instruction information containing 
text which output by tne action determining mechanism 
unit 52 which is the object of voice synthesizing is sup-i 
plied to the text generating unit 31 , and the text gener- 
ating unit 31 anaiyzes the text contained jn the action 
instruction information, rnaking reference to the diction- 
ary storing unit 34 and analyzing grammar storing unit 
35. , *' .. ' " 

[0116] That is, the dictionary storing unit 34 has stored 
therein word dictionaries describing part of speech in- 
formation for each word, reading, accentuation, and oth- 
er information thereof. Also, the analyzing grammar 
storing unit 35 stores analyzing syntaxes relating to re- 
strictions of word concatenation and the like, regarding 
the words described in the word dictionaries in the dic r 
tionary storing unit 34. Then, the text generating unit 31 
performs morpheme analysis and. grammatical struc^ 
ture analysis of the input text basedon the word diction- 
aries and analyzing syntaxes, and extracts information 
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necessary to the rule voice synthesi2ing performed by 
the latter rules synthesizing unit 32,. Here, eicamples of 
information necessary for rule voice synthesizing in- 
clude pause positions, pitch informal brf such as infor- 
mation for controlling accents and 'intonation, phone- 
mics information such as the pronunciation and the like 
of each word, and so forth. 

[0117] The information obtained at the text generating 
unit 31 is then supplied to the rules synthesizing unit 32, 
and at the rules synthesizing unit 32, voice data (digital 
data) or synthesized sounds corresponding to the text 
input to the text generating unit 31 is generated using 
the phoneme storing unit 36. 

[01 18] That is, phoneme data in the form of CV (c6n : ' 
sonant, Vowel), VCV.'CVC, etc.; Is 'stored in the pho- 
neme storing unit 36, so the rujes synthesizing unit 32 
connects the necessary phoneme data based on the in- 
formation from the text generating unit 31, and' further 
adds pau ses [ accents , i riton ation , etc. in an app ropHate ; 
manner, thereby "generating voice daia of syhthesiied 1 
sound* corresponding to the text input to the text gener-^ 
ating unit 31. " 
[01 19] This voice data is supplied to the D/A (Digital/ 
Analog) converting unit 33 s and there is subjected to D/ 
A conversion to analog voice signals. The voice signals^ 
are supplied to the speaker 18, thereby outputting the' 
synthesized sound corresponding to the text input' to the 
text generating unit 31 . 

[01 20] 'The voice synthesizing unit 55 thus configured 
receives supply of action instruction information con- 
taining text which is the object of voice synthesizing from 
theaction determining mechanism unit 52; also receives 
supply of the state of emotions and instincts from the 
emotion/instinct model unrt51 , and further receives sup- 
ply of action information' from the action determining 
rhechanism unit 52, and the text generating unit 31-ar.d' 
rules synthesizing Unit 32 perform voice synthesizing* 
processing taking the state of emotions and instincts 
aha the actfiori information intoconsideratioh. ' ? ; • 
[01 21 [ Now, the voice synthesizing processing per-- 
formed while taking the state of emotions arid instirictsj 
into consideration will be described, with reference to 
the flowchart in Fig. ^2 : At the point that the action de-' 
termining mechanism unit 52 outputs the action instruc- 
tion information containing text which is the object of 
voice synthesizing to the voice synthesizing unit 55, the 
text generating unit 31 receives the action instruction 
information in step S21 , and the flow proceeds to step 
S22. At step S22, the state of emotions and instincts oi 
therobot is recognized in step S22 in the text generating' 
unit 31 'and rules synthesizing unit 32 by making refer-' 
ence to the emotion/instinct model unit 51. and the flow 
proceeds to step S23. ' " 

[0122] Instep S23, at the text generating unrt'31 , the 
vocabulary (speech vocabulary) used for generating 
text to be actually output as synthesized sound (hereaf- 
ter also referred to as "speech text") Is set from the text 
contained in the action instruction informationifrdm the 



action determining mechanism unit 52, based on the 
emotions and instincts of the robot, and the flow pro- 
ceeds to step^A. In step S24, at the iext generating 
unit 31 , speech text corresponding to the text contained 

5 in the action instruction information is generated using 
the speech vocabulary set in step S23. 
[0123] That is, the text contained in tne action instruc- 
tion inf ormation from the action determining mechanism 
unit 52 is such tnat presupposes speech in a standard 

10 state of emotions and instincts, and in step S24 the text 
is corrected taking into consideration the state of emo- 
tions and instincts of the robot, thereby generating 
speech text. 

[0124] Specifically, in the event that the text contained 
is jn the action instruction information is "What is it?" for 
example/and the emotion state of the robot represents 
"angry", the text is generated as speech text of "Yeah, 
what?" to indicate anger. Also, in the event that the text 
contained in the action instruction information is "Please 
20 stop" for example; and the emotion state of "the robot 
" " represents "angry 0 , the text is generated as speech text 
of "Quit itrtb indicate anger. 

[0125] Then, the flow proceeds to step S25, the text 
generating unit 31 performs text analysis of the speech 

25 text such as morpheme analysis and grammatical struc- 
ture analysis, and generates pitch information such as 
pitch frequency, power, duration, etc., serving as infor- 
mation necessary for performing rule voice synthesizing 
regarding the speech text. Further, the text igeneratirig 

30 unit 31 also generates phonemics information such as 
the pronunciation of each work making up the speech 
text. Here, in step S25, standard phonemics information- 
is generated for the phonemics ■* information of the' 
. speech iext - - . 

35 [01 26] Subsequently, in step S26, the text generating 
1 unit 31 corrects the phonemics information of the 
speech text set in step S25 based oh the state of emo- 
tions and instincts of the robot, thereby giving greater 
emotional expressions at the point of outputting the 

40 speech text as synthesized sound. 
" [0127] Now, the details of the relation between emo- 
tion and speech are described in, e.g., "Conveyance of 
Paralinguistic Information by Speech: From the* Per- 
spective of Linguistics", MAEKAWA, Acoustical Society 

45 of Japan 1997 Fall Meeting Papers Vol. 1-3-10, pp.. 
381-384, September 1997, etc. 

[01 28] The phonemics information and pitch informa- 
tion of the speech text obtained at the text generating 
unit 31 is supplied to the rules synthesizing unit 32, and 

so in step S27, at the rules synthesizing unit 32, rule-voice 
synthesizing is performed following the phonemics in- 
formation and pitch information, thereby generating dig- 
ital data of the synthesized sound of the speech text.: 
Now, at the rules synthesizing unit 32 also, pitch such- 

55 as the position of pausing, the position of accent, into *, 
nation; etc., of the synthesized sound, is changed so as 
to appropriately express the state of emotions and -in- 
stincts of the robot,' based on the state of emotions and 
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instincts thereof . 

[0129] The digital data of the synthesized sound ob- 
tained at the rules synthesizing unit 32 is supplied to the 
D/A converting unit 33.. In step S2iS, at the D/A convert- 
ing, unit 33, digital data from the rules synthesizing unit s 
32 is subjected to D/A conversion, and supplied to the 
speaker 18, thereby ending processing. Thus, synthe- 
sized sound of the speech.text which has pitch reflecting 
the state of emotions and instincts of the robot is output 
from the speaker 18. w 
[01 30] Next, the voice synthesizing processing which 
is performed taking into account the action information 
will be described with, reference to the flowchart in Fig. 
13. ' 

[0131] At the point that the action determining mech- is 
antsm unit 52 outputs the action instruction information 
containing text which is the object oil voice synthesizing 
to the voice synthesizing unit 55, the text generating unit 
31, receives the action instruction information in step! w 
S31 , and the flow proceeds to step S32. At step S32, 26 
the current action of the robot is confirmed in the text 
generating unit 31 and rules synthesizing unit 32 by 
making reference to the action information output by the. 
action determining mechanism unit 52, and the flow pro- i 
ceeds to sjep S33. 4 , 25 

[0132] In step S33, at the text ; generating unit 31 , the. 
vocabulary (speech^ vocabulary) used .for generating 
speech text is set from the text contained in the action 
instruction information from, the action determining 
mechanism unit 52, based on action information, "and 30 
speech text corresponding to the text contained in the' 
action instruction information is generated using the. 
speech vocabulary. 

[0133] Then the flow proceeds to step S34, the text' 
generating unit 31 performs morpheme analysis and 35 
grammatical structure analysis of the speech text, and 
generates pitch information such as pitch frequency,! 
power, duration, etc., serving as information necessary, 
for performing rule voice synthesizing regarding the 
speech text. Further, the text generating unit 31 also' 40 
generates phonemics information such as the pronun- 
ciation of. each work making up the speech text., Here,, 
in step S34 as well, standard pitch information is gener-, 
ated for the pitch information, of the speech text, in the 
same manner as with step S25 in Fig. 12. . . , *s 
[0134] Subsequently, in step S35, the text generating 
unit 31 corrects the pitch information of the speech text 
generated in step S25 based on the action information., 
[0135] ...That is, in the event that the robot is walking, 
for example, there is noise from the driving of the actu- so 
atorsSAA^ through 5A 1 and 5A 2 as described above On 
the other hand, in the event that the robpt js sitting .or 
lying prostrate, there is no such noise. Accordingly, the 
synthesized sound is harderto hear in the eyent that the, 
robot is walking, in comparison to cases wherein the ro-, ss 
bot is sitting or lying prostrate. . > .... • . 
[0136] Thus, in the event that the. action information I 
indicates the robot is walking, the text generating unit 
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31 corrects the pitch information so as to slow the 
speech speed of the synthesized sound or increase the 
power there of /thereby making the synthesized sound 
more readily understood. 

[0137] In other 'arrangements, correction may be 
made in step S35 such that trie pitch frequency value 
differs depending on whether the action information in- 
dicates that the robot is on its side or standing. 
[01313] The phonemics information and pitch informa : 
tion of the speech text obtained at the text generating 
unit 31 is supplied to the rules synthesizing unit 32, and 
in step S36, at the rules synthesizing unit 32, rule voice 
synthesizing is perforined fbllpwing trie_phonemics in- 
formation and pitch information, thereby generating dig- 
ital data of the synthesized sound of the speech text 
Now, at the rules synthesizing unit 32 also, the position 
of pausing, the position of accent, intonation, etc., of the 
synthesized sound, is changed as necessary, at the time 
of rule] voice synthesizing. . f ' 
[01 39] The digital data of the synthesized sound ob- 
tained at the rules synthesizing unit 32 is supplied to the 
D/A converting unit 33. In step S37, at the D/A convert- 
ing, unit ,33, digital data from the rules synthesizing unit 

32 is subjected to D/A conversion, and supplied to the 
speaker 18, thereby ending processing. 

[01,46] Incidentally, in the event of generating synthe- 
sized sound at the voice synthesizing unit 55 taking into 
consideration the state of emotions and instincts, and 
the action information, the output of such synthesized 
sound and the actions of the robot may be synchronized 
in.a way. . . 

[0141] That is, for example, in the event that the emo : 
tion state represents "not angry", and the synthesized 
sound '"What is Jt?", is to be output taking the state of 
emotion into consideration, the robot may be made to 
face the user in a manner synchronous with the output 
of the, synthesized sound, on the other hand, for exam- 
ple, in the event that the emotion state represents "an : 
9*y"; and the synthesized sound "Yeah, what?" is to be 
output taking the state of emotion into consideration, the 
robot, may be made to face the other way in a manner 
synchronous with the output of the synthesized sound. 
[0142] Also, an arrangement may be made wherein, 
in the event of output of the synthesized sound "What 
is it?-, the robot is made to act at normal speed, and 
wherein in the event, of output of the synthesized sound 
"Yeah, what?", the robot is made to act at a speed slower 
than normal, in a sullen and unwilling manner In this 
case, the robot can express emotions to the user with 
both motions and synthesized sound. 
[0143] i Further, at the action determining mechanism 
unit 52, the next action is determined based on an action 
model represented by a finite automaton such as shown 
in Fig. 6, and the contents of the text output as synthe- 
sized sound can be correlated with the transition of state 
in the action model in Fig. 6. 

[0144] That is, for example, in the event of making 
transition from the state corresponding to the action "sit- 
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ting" to the state corresponding to the action "standing", 
a text such as "Here goes! 11 can be correlated thereto. 
In this case; in the event of the ro'bot making transition 
from a sitting position to a standing positiph, the synthe- 
sized sound "Here goes!" can be output in a manner syn- s 
chronous with the transition in position. .. 
[01 45] As described above, a robot with a high enter- 
tainment nature can be provided by controlling the voice 
synthesizing processing and voice recognizing process- 
ing, based on the state of the robot. 10 
r0146] Next, Fig. 14 illustrates a configuration exam- 
ple of the image recognising unit 50B^making up the sen- 
sor input processing unit 50 shown in Fig. 3. . 
[0147] Image signals o^ut"frpm l !the;'CCD camera 
are supplied to the A/D converting unit 41, and there is 
subjected to A/D con version, thereby becoming digrtai 
image data. This digital imag6 data is supplied to the 
image processing unit 42:/Af ft^ 

42, predetermined Wa^e processing sucHlE^pCT ; (pis-^ 
crete Cosine Transform) and the like for example is pfef-; 20 
formed to the image data 'from 1 the f/D 'converting unrt ! 
41, and this is supplied to the recognition coilatibrTunit 

[01 48] The recognition collation, unit ^calculates the' 
distance between each of multiple/ image' patterns 25 
stored in the image pattern storing unit 44, and the'out-^ 
put of the image processing unit 42. and detect th'e'irrP 
age pattern with the smallest distance. Then, the rWo^cj- 
nition collation unit 43 recognizes the image taken witiv 
the CCD camera 1 6, eiHd outputs the recognition results' 30 
as state recognition information to the emotion/instinct; 
model unit 51 and action determining mechanism unit 
52, based on the detected image pattern. ' ' { 
[01 49] Now, the configuration shown in the block: dia-^ 
gram in Fig: 3 is realizetJ bythe CPtJ TOA executing con* 
trol prograrns, as described above 1 : Now, taking only the? 
poweV of the CPU 16A (hereafter also referred to simply 
as "CPU power 1 ') into consideration as a 'resource nec-^ 
essary for realizing the vbicerecbghizing uriit 50A, tfie 
CPU power is determined singly' by 'the hardware em- 
ployed for the CPU 1 0A, and the processing amounV (ihe 
processing amount per unit time) which can be executed 
by the CPU power is also determined singly. ' 
[0150] On the other hand, in the processing to be ex- 
ecuted by the CPU 10A, there is processing" which 
should be performed with priority over the voice recog- 
nition processing (hereafter also referred to as "priority 
processing"), and accordingly, in the event that the load 
of the CPU 10A for priority processing increases, the 
CPU power which can be appropriated to voice recog- 
nition processing decreases. * * 1 
[01 51] That is, representing the load on the CPU 1 0A 
regarding priority processing as x%, and representing 
the CPU power which can be appropriated to voice rec- 
ognition processing as y%, the relation between x and 
y is represented by the expression ' 



x + y=100% 

and is as shown in Fig. 15. 
[0152] Accordingly, in the event that the load for pri- 
ority processing is 0%, 1 00% of the CPU power can be 
appropriated.to voice recognition processing. Also, in 
the event that the load regarding priority processing is 
S (0 < S < 100)%, 100 - S% of the CPU power can be 
appropriated. Also, in the event that the load for priority 
processing is 1 00%, no CPU power can be appropriated 
to voice recognition processing. 
[0153] Now, for example, in the event that the robot is 
walking for example, and CPU power appropriated to 
the processing for the action of "walking" (hereafter also 
referred to as "walking processing") is insufficient, ihe 
walking speed becomes slow, and in a worst scenario, 
the 1 robot may stop walking: Such slowing or stopping ' 
while^walking is unnatural to the user, so there is the 
need to prevent such a state if at all possible; and ac- 
cordingly, it can be said that the walking processing per-" 
formed while the robot is walking must be performed 
with priority over the voice recognition processing. 
[0154] That is', in the event that the processing cur- 
rency being carried out is obstructed by voice recogni- 
tion processing being performed and the movement of 
the robot is no longer smooth due to this, the user will 
sense that this is unnatural. Accordingly, is can be said 
that basically, the processing being currently performed 
must be performed with priority over the voice recogni- 
tion processing, and that voice recognition processing 
should be performed within a range so as to not obstruct 
the processing being currently performed. 
[0155] To this end, the action determining mechanism 
unit 52 is arranged so as to recognize the action being 
currently taken by the robot, and controlling voice rec- 
ognition processing by the voice recognizing unit 50A, 
based on' the load corresponding to the action. 
[0156] That is^as shown in the flowchart in Fig. 16, in 
step S41 , the action determining mechanism unit 52 rec- 
ognizes the action being taken by the robot, based "on 
trie- action model which it itself manages, and the flow 
proceeds to step S42. In step S42, the action determin- 
ing mechanism unit'52 recognizes the load regarding 
the processing for continuing the current action recog- 
nized in step S41 in the same manner (i.e., maintaining 
the action)] ' 

[01 57] Now, the load corresponding to the processing 
for continuing the current action in the same manner can 
be obtained by predetermined calculations." Also, the 
load can also be obtained by preparing beforehand the * 
a table correlating actions and estimated CPO power for 
performi ftg processing corresponding to the actions, 
and makinjg reference to the table. Note that less 
processing amount is. required for the table than for cal- 
culation: * * 

[0158] Following obtaining the load corresponding to 
the processing for continuing the current action in the 
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same mariner, the flow proceeds to step S43, and the 
action determining mechanism' unit 52 obtains the CPU 
power which can be appropriated to voice recognizing 
processing, based on the load, from the relationship 
shown in Fig. 15. Further, the action determining mech- 5 
anism unit 52 performs various types of control relating 
to voice recognizing processing based on the CPU pow- 
er which can be appropriated to the voice recognizing 
processing, the flow returns to step S41, and subse- 
quently the same processing is repeated. 10 
[01 59] t That is, the action determining mechanism unit 
52 changes the word dictionaries used for voice recog- 
nizing processing, based on the CPU power which can 
be appropriated to the voice recognizing processing. 
Specifically, in the event that sufficient CPU power can 1 5 
be. appropriated to the voice recognizing processing, 
settings are made such that dictionaries with a great 
number of words registered therein are used for Voice 
recognizing processing. Also, in the event that sufficient 
CPU power cannot be appropriated to the voice recog- 20 
nizing processing, settings are made such that diction" 
aries with few words registered therein are used for 
voice recognizing. 

[0160] Further, in the event that practically no CPU 
power can be appropriated to voice recognizing 25 
processing, the action determining mechanism unit 52 
puts the voice recognizing unit 50 A to sleep (a state 
wherein no voice recognizing processing is performed). 
[0161] Also, the action determining mechanism unit 
52 causes the robot to take actions corresponding to the 30 
CPU power which can be appropriated,, to voice recog- 
nizing processing. 

[0162] That is, in the event that practically .no CPU 
power can be appropriated to voice recognizing 
processing, or in the event that sufficient CPU power 35 
cannot be appropriated thereto, no voice recognizing 
processing is performed, or the voice recognizing pre- 
cision and processing speed may deteriorate, giving the 
user an unnatural sensation. . 
[0163] Accordingly, in the event that practically no 46 
CPU power can be appropriated to voice recognizing 
processing, or in the event that sufficient CPU power 
cannot be appropriated thereto, the action determining 
mechanism unit 52 causes the robot to take listless ac- 
tions or actions such as cocking the head, thereby no- 4s 
tifying the user that voice recognition is difficult. 
[0164] Also, in the event that sufficient CPU power 
can be appropriated to voice recognizing processing, 
the action determining mechanism unit 52 causes the 
robot to take energetic actions or actions such as nod- so 
ding the head, thereby notifying the user that voice rec- 
ognition is sufficiently available. 

[0165] In addition to the robot taking such as actions 
as described above to notify the user whether voice rec- 
ognition processing is available or not, arrangements 55 
may be made wherein special sounds such as "beep- 
beep-beep" or "tinkle-tinkle-tinkle", or predetermined 
synthesized sound messages, are output from the 



speaker 18. 1 ' ' 

[0166] Also, in the event that the robot has a liquid 
crystal panel; the user can be notified regarding whether 
voice recognition processing is available or not by dis- 
playing predetermined messages on the liquid crystal 
panel. Further, in the event that the robot has a mecha- 
nism by expressing facial expressions such as blinking 
and so forth , the user can be notified regarding whether 
voice recognition processing is available or not by such 
changes in facial expressions. ' 
[0167] Note that while in the above case, only the 
CPU power has been dealt with, but other resources for 
voice recognition processing (e.g., available space on 
the memory 10B, etc.) may be the object of such man- 
aging. 

[0168] Further, in the above, description has been 
made with focus on the relation between voice recogni- 
tion processing at the voice recognizing unit 50A and 
other processing, but the same can be said regarding 
the 'relation between image' recognizing processing at 
the image recognizing unit SOB and other processing, 
voice synthesizing processing at the voice synthesizing 
unit 55 and other processing, and so forth. 
[0169] The above has been a description of an ar- 
rangement wherein the present invention has been ap- 
plied to an entertainment robot (i.e., a robot serving as 
a pseudo pet), but the present invention is by no means 
restricted to this application; rather, the present inven- 
tion can be widely applied to various types of robots, 
such as industrial robots, for example. 
[0170] Further, in the present embodiment, the 
above-described series of processing is performed by 
the CPU 10A executing programs, by the series of 
processing may be carried but by dedicated hardware 
for each. V \ , ' 

[0171 ] Also, in addition to storing the programs on the 
memory 1 0B (see Fig. 2) beforehand, the programs may 
be temporarily or permanently stored (recorded) on re- 
movable recording media such as (floppy disks, CD- 
ROM (Compact Disk Read-Orily Memory), MO (Magne- 
to-Optical) disks, DVDs (Digital Versatile Disk), magnet- 
ic disks,, semiconductor memory, etc. Such removable 
recording mediums rhay be provided as so-called pack- 
aged software, so as to be installed in the robot (memory 
iob)!' j i; : ' " 

[0172] Also, in addition to installing the programs from 
removable recording media, arrangements may be 
made wherein the prograrns are transferred from a 
download site in a wireless manner via a digital broad- 
cast satellite, or by cable via networks such as LANs 
(Local Area Networks) or the Internet, and thus installed 
to the memory 1 0B. 

[0173] ' In this case, in the event that a newer version 
of the program is released, the newer version can be 
easily installed to the memory 10B. 
[01 74] Now, in the prese nt specif ication , the process- 
ing steps describing the program for causing the CPU 
1 0A to perform various types of processing do not nec- 
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essarily need to be processed in the time-sequence fol- 
lowing the order described in the flowcharts; rather, the 
present specification includes anran^ernents, where in 
the steps are processed in parallel. or. pdiyjdually (e.g., 
parallel processing.or processing by objects). T 
[0175] . Also, the programs may be processed by a sin- 
gle CPU, or the processing thereof may be dispersed 
between multiple CPUs and thus processed. 
[0176] . In so far as the embodiments of the invention 
described above are implerhented, at least in part, using 
softwarecontrolled data processing', apparatus, it will be 
appreciated that a computer program providing, such 
software control and a storage medium.by which such, 
a computer program is stored are envisaged as aspects, 
of the present invention. 

Claims . .. ... ~. 

1 . An voice processing device built into a robot,, said 
, .voice processing device comprising: . r _ 

voice processing means for processing voice; 

and . ■ . , ■ , 

control means for controlling voice processing 
... by said voice processing means, based on the 
state of said robot. , . r ^ 

2. An voice processing device, according tp^ Claim 1, 
wherein said control means control said voice* proc-* 
ess based on the state of actions, emotions or in-! 

.stincts qf v said .robot. . . - 

3. An voice processing device according to Claim. 1,* 
wherein said Voice processing means comprises 
voice synthesizing means for performing voice syri-" 

" thesizing processing and outputting synthesized 
sound; j ' J • . : .!• 

; and wherein. said control means control trie' voice 
synthesizing processing by said Voice synthesizing 
means,, based on the state of said robot. 

4. An voice processing device according to Claim 3, 
wherein said control means control phonemics in- 
formation and pitch information output by said Voice 
synthesizing means., 

5. An voice processing device according to Claim 3, 
wherein said control . means control the speech 
speed or volume of synthesized sound output by 
said voice synthesizing means. 

6. An voice processing device according* to Claim 1, 
' wherein said voice processing means extract the 

control pitch information or phonemics information 
. of the input voice; 

and wherein the emotion state of said robot is 
changed based on said pitch information or phbner 



mics information, or said robot takes actions corre- 
sponding to said pitch information or phonemics in- 
formation. 

5 7. An voice processing device according to Claim 1 , 
wherein said voice processing means comprises 
voice recognizing means for recognizing input 
voice; 

and wherein said robot takes actions corresponding 
io to the reliability of the voice recognition results but- 
put from said voice recognizing means, or the erho- . 
tion state of said robot is changed based on said 
reliability. 

is 8. An voice processing device according to Claim 1 , 
wherein said control means recognizes the action 
which said robot is taking, and controls 1 voice 
. processing by said voice processing means based 
on the load regarding that action. 

9. An voice processing device according to Claim 8, 
wherein said robot takes actions corresponding to 
resources which can be appropriated to voice ' 

, processing by said voice processing means. 

25 

10. An voice processing method for an voice process- 
ing device built into a robot, said method compris- 
in 9 : 

30 • an VO j ce processing step for processing voice; 
and 

: a control step for controlling voice processing 
in said voice processing step, based on the 
state of said robot. 
35 ' '*' 

11. A recording medium recording programs to be ex- 
ecuted by a computer, for causing a robot to perform 
voice processing, said program comprising: 

40 an voice processing step for processing voice; 

[ and 

l - a control step for controlling voice processing 
1 in said voice processing step, based on the 
state of said robot. 

45 ' ■ - 
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